Code
library(tidyverse)
library(babynames)
Error in library(babynames): there is no package called 'babynames'
Code
library(dplyr)
::opts_chunk$set(echo = TRUE) knitr
Lai Wei
November 19, 2022
In this HW, I will use mmr_2015.csv, which is a data set that contains a subset of the (real) data that were used to generate the United Nations Maternal mortality estimates, as published in the year 2015.
Error in library(babynames): there is no package called 'babynames'
Background for mmr_2015.csv: The maternal mortality ratio (MMR) is defined as the number of maternal deaths per 100,000 live births. The UN maternal mortality estimation group produces estimates of the MMR for all countries in the world.
iso country year mmr
1 AFG Afghanistan 2000 1900.000
2 DZA Algeria 1999 117.410
3 BGD Bangladesh 2009 194.000
4 BGD Bangladesh 2000 322.156
5 BWA Botswana 2006 139.790
6 BWA Botswana 2012 147.900
7 BWA Botswana 2005 157.700
8 BWA Botswana 2010 163.000
9 BWA Botswana 2013 182.600
10 BWA Botswana 2007 183.470
11 BWA Botswana 2011 188.860
12 BWA Botswana 2009 189.570
13 BWA Botswana 2008 195.730
14 CMR Cameroon 2010 652.000
15 CHN China 2012 24.500
16 CHN China 2011 26.100
17 CHN China 2010 30.000
18 CHN China 2009 31.900
19 CHN China 2008 34.200
20 CHN China 2007 36.600
21 CHN China 2006 41.100
22 CHN China 2002 43.200
23 CHN China 2005 47.700
24 CHN China 2004 48.300
25 CHN China 2001 50.200
26 CHN China 2003 51.300
27 CHN China 2000 53.000
28 CHN China 1999 58.700
29 CHN China 1995 61.900
30 CHN China 1997 63.600
31 CHN China 1990 88.900
32 EGY Egypt 2006 59.000
33 EGY Egypt 2004 68.000
34 EGY Egypt 2000 84.000
35 EGY Egypt 1992 174.000
36 SLV El Salvador 2008 51.090
37 SLV El Salvador 2007 56.930
38 HND Honduras 2012 74.100
39 HND Honduras 2013 86.000
40 IND India 2012 167.000
41 IND India 2011 178.000
42 IND India 2008 212.000
43 IND India 2005 254.000
44 IND India 2002 301.000
45 IND India 2000 327.000
46 IND India 1998 398.000
47 IND India 1992 437.000
48 IND India 1999 540.000
49 IRN Iran (Islamic Republic of) 2013 19.700
50 IRN Iran (Islamic Republic of) 2012 19.900
51 IRN Iran (Islamic Republic of) 2008 20.900
52 IRN Iran (Islamic Republic of) 2011 21.500
53 IRN Iran (Islamic Republic of) 2006 21.700
54 IRN Iran (Islamic Republic of) 2010 22.100
55 IRN Iran (Islamic Republic of) 2005 23.800
56 IRN Iran (Islamic Republic of) 2004 24.100
57 IRN Iran (Islamic Republic of) 2007 24.700
58 IRN Iran (Islamic Republic of) 2009 25.400
59 IRN Iran (Islamic Republic of) 2002 27.400
60 IRN Iran (Islamic Republic of) 2003 28.300
61 IRQ Iraq 2012 35.000
62 JAM Jamaica 2009 73.200
63 JAM Jamaica 1999 73.500
64 JAM Jamaica 1995 81.300
65 JAM Jamaica 2012 81.300
66 JAM Jamaica 2004 82.500
67 JAM Jamaica 2003 89.800
68 JAM Jamaica 1998 90.100
69 JAM Jamaica 2000 90.300
70 JAM Jamaica 2001 91.500
71 JAM Jamaica 2007 92.900
72 JAM Jamaica 2011 95.700
73 JAM Jamaica 2006 96.700
74 JAM Jamaica 2008 102.000
75 JAM Jamaica 2005 109.200
76 JAM Jamaica 2002 110.500
77 JAM Jamaica 2010 113.300
78 MNG Mongolia 2014 30.200
79 MNG Mongolia 2013 42.600
80 MNG Mongolia 2010 47.400
81 MNG Mongolia 2008 48.600
82 MNG Mongolia 2011 48.700
83 MNG Mongolia 2012 51.500
84 MNG Mongolia 2006 67.200
85 MNG Mongolia 2009 81.000
86 MNG Mongolia 2007 88.300
87 MNG Mongolia 2005 92.700
88 MNG Mongolia 2004 96.700
89 MNG Mongolia 2003 107.200
90 MNG Mongolia 2002 121.500
91 MNG Mongolia 1997 143.500
92 MNG Mongolia 1998 162.400
93 MNG Mongolia 2001 165.000
94 MNG Mongolia 2000 166.200
95 MNG Mongolia 1996 173.700
96 MNG Mongolia 1999 182.000
97 MNG Mongolia 1995 186.000
98 MNG Mongolia 1992 203.900
99 MNG Mongolia 1994 219.000
100 MNG Mongolia 1993 259.000
101 MMR Myanmar 2004 315.860
102 NPL Nepal 2008 229.000
103 OMN Oman 1993 6.000
104 OMN Oman 1997 13.300
105 OMN Oman 1999 13.700
106 OMN Oman 2005 15.400
107 OMN Oman 1992 15.900
108 OMN Oman 2000 16.100
109 OMN Oman 1998 18.500
110 OMN Oman 2004 18.500
111 OMN Oman 1996 21.000
112 OMN Oman 1995 22.000
113 OMN Oman 2001 23.100
114 OMN Oman 2003 23.200
115 OMN Oman 1994 24.400
116 OMN Oman 1991 27.400
117 OMN Oman 2002 37.500
118 PRY Paraguay 2001 178.000
119 PER Peru 2011 92.700
120 PER Peru 2010 95.900
121 PER Peru 2009 96.100
122 PER Peru 2008 107.900
123 PER Peru 2007 110.500
124 PER Peru 2005 114.100
125 PER Peru 2006 114.900
126 PER Peru 2002 118.300
127 PER Peru 2004 120.800
128 PER Peru 2003 123.800
129 SAU Saudi Arabia 1997 23.000
130 SSD South Sudan 2005 2037.000
131 LKA Sri Lanka 2010 31.100
132 LKA Sri Lanka 2011 32.500
133 LKA Sri Lanka 2008 33.400
134 LKA Sri Lanka 2012 37.700
135 LKA Sri Lanka 2004 38.000
136 LKA Sri Lanka 2007 38.400
137 LKA Sri Lanka 2006 39.300
138 LKA Sri Lanka 2009 40.200
139 LKA Sri Lanka 2003 42.400
140 LKA Sri Lanka 2005 44.000
141 LKA Sri Lanka 2001 46.600
142 LKA Sri Lanka 1998 53.000
143 LKA Sri Lanka 2002 53.400
144 LKA Sri Lanka 2000 55.600
145 LKA Sri Lanka 1999 55.800
146 LKA Sri Lanka 1995 61.000
147 LKA Sri Lanka 1996 62.000
148 LKA Sri Lanka 1997 63.000
149 SDN Sudan 2009 215.600
150 SDN Sudan 2005 638.000
151 SYR Syrian Arab Republic 2008 56.000
152 THA Thailand 1998 36.400
153 THA Thailand 1997 36.500
154 THA Thailand 2005 37.400
155 THA Thailand 2006 41.600
156 THA Thailand 1996 44.100
157 THA Thailand 2004 44.500
158 TUN Tunisia 2008 44.800
159 TUN Tunisia 1993 68.900
160 TUR Turkey 2014 15.200
161 TUR Turkey 2012 15.400
162 TUR Turkey 2011 15.500
163 TUR Turkey 2013 15.900
164 TUR Turkey 2010 16.400
165 TUR Turkey 2009 18.400
166 TUR Turkey 2008 19.400
167 TUR Turkey 2007 21.200
168 ARE United Arab Emirates 2005 0.000
169 ARE United Arab Emirates 2000 0.000
170 ARE United Arab Emirates 2008 1.400
171 ARE United Arab Emirates 1995 22.400
172 ARE United Arab Emirates 1990 32.600
173 VNM Viet Nam 2009 69.000
174 VNM Viet Nam 2001 130.000
175 YEM Yemen 2012 148.160
Variables in the data set mmr_2015.csv are as follows:
Construct a graph that shows the observed values of the MMR plotted against year (starting in 2000) for China and Inida.Use the pipe operator so that the graph follows from a multi-line command that starts with “mmr %>%”.Use ggplot() to display the data.
Babynames package is the Names of male and female babies born in the US from 1880 to 2017.Babynames was filtered to include only those rows with year > 1975, sex equal to male, and either prop > 0.025 or n > 50000.
Error in filter(., year > 1975, sex == "M", prop > 0.025 | n > 50000): object 'babynames' not found
Construct and print a tibble that shows the countries sorted by their average observed MMR (rounded to zero digits), with the country with the highest average MMR listed first.
# A tibble: 30 × 2
country ave
<chr> <dbl>
1 South Sudan 2037
2 Afghanistan 1900
3 Cameroon 652
4 Sudan 427
5 Myanmar 316
6 India 313
7 Bangladesh 258
8 Nepal 229
9 Paraguay 178
10 Botswana 172
# … with 20 more rows
For each year
Calculate the mean ranking across all years, extract the mean ranking for 10 countries with the lowest ranking across all years, and print the resulting table.
`summarise()` has grouped output by 'year'. You can override using the
`.groups` argument.
# A tibble: 10 × 2
country mean_rank
<chr> <dbl>
1 Turkey 1.12
2 Oman 1.2
3 United Arab Emirates 1.2
4 Iran (Islamic Republic of) 2
5 Saudi Arabia 2
6 Thailand 3.17
7 Sri Lanka 3.5
8 China 3.59
9 Iraq 4
10 Tunisia 4
Do the same thing but now with rankings calculated separately for two periods, with
For each period
Calculate the mean ranking across all periods, extract the 10 countries with the lowest ranking across all periods, and print the table.
`summarise()` has grouped output by 'before_2000'. You can override using the
`.groups` argument.
# A tibble: 10 × 2
country mean_rank
<chr> <dbl>
1 Oman 2
2 Saudi Arabia 2
3 Turkey 2
4 United Arab Emirates 2
5 Iran (Islamic Republic of) 4
6 Iraq 5
7 China 6
8 Sri Lanka 6
9 Thailand 6
10 Tunisia 8
Visualize the results for part a or b
mmr %>%
mutate(before_2000 = year < 2000) %>%
group_by(before_2000, country) %>%
summarise(mean_obs = mean(mmr)) %>%
group_by(before_2000) %>%
mutate(rank = rank(mean_obs)) %>%
group_by(country) %>%
summarize(mean_rank = mean(rank))%>%
arrange(mean_rank) %>%
ungroup() %>%
slice(1:10) %>%
ggplot() +
geom_col(mapping = aes(x = fct_reorder(country, desc(mean_rank)),
y = mean_rank)) + theme_bw() +
scale_fill_brewer() + labs(x = "", y = "Mean rank") + coord_flip()
`summarise()` has grouped output by 'before_2000'. You can override using the
`.groups` argument.
---
title: "HW3"
author: "Lai Wei"
desription: "gain experience with working with external data, dplyr, and the pipe operator."
date: "11/19/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- hw3
- Lai Wei
---
In this HW, I will use mmr_2015.csv, which is a data set that contains a subset of the (real) data that were used to generate the United Nations Maternal mortality estimates, as published in the year 2015.
```{r}
#| label: setup
#| warning: false
library(tidyverse)
library(babynames)
library(dplyr)
knitr::opts_chunk$set(echo = TRUE)
```
## Read Data
Background for mmr_2015.csv:
The maternal mortality ratio (MMR) is defined as the number of maternal deaths per 100,000 live births. The UN maternal mortality estimation group produces estimates of the MMR for all countries in the world.
```{r}
mmr <- read.csv("_data/mmr_2015.csv")
mmr
```
Variables in the data set mmr_2015.csv are as follows:
- Iso = ISO code
- Name = country name
- Year = observation year
- MMR = observed maternal mortality ratio, which is defined as the number of maternal deaths/total number of births*100,000
## Data Visualization
Construct a graph that shows the observed values of the MMR plotted against year (starting in 2000) for China and Inida.Use the pipe operator so that the graph follows from a multi-line command that starts with “mmr %>%”.Use ggplot() to display the data.
```{r}
data_IT <- filter(mmr,country == "China"|country == "India",year >= 2000)
ggplot(data = data_IT,aes(x = year,y= mmr))+
geom_point(aes(group = country,color = country))
```
## Babynames
Babynames package is the Names of male and female babies born in the US from 1880 to 2017.Babynames was filtered to include only those rows with year > 1975, sex equal to male, and either prop > 0.025 or n > 50000.
```{r}
babynames %>%
filter(year > 1975, sex == "M",prop > 0.025|n > 50000) %>%
ggplot(aes(x = year, y = prop))+
geom_point(aes(group = name,color = name), size = 2)+
geom_line(aes(group = name, color = name))+
expand_limits(y = 0)
```
## Tidy Table
Construct and print a tibble that shows the countries sorted by their average observed MMR (rounded to zero digits), with the country with the highest average MMR listed first.
```{r}
data1<- group_by(mmr,country) %>%
summarise_at(vars(mmr),list(name = mean))
names(data1)[2] = "ave"
data1$ave <- round(data1$ave,0)
arrange(data1,desc(ave))
```
## Continuing with the mmr data set
# Part a:
For each year
- first calculate the mean observed value for each country
- then rank countries by increasing MMR for each year.
Calculate the mean ranking across all years, extract the mean ranking for 10 countries with the lowest ranking across all years, and print the resulting table.
```{r}
mmr %>%
group_by(year, country) %>%
summarise(mean_obs = mean(mmr)) %>%
group_by(year) %>%
mutate(rank = rank(mean_obs)) %>%
group_by(country) %>%
summarize(mean_rank = mean(rank)) %>%
arrange(mean_rank) %>%
slice(1:10)
```
# Part b:
Do the same thing but now with rankings calculated separately for two periods, with
- period 1 referring to years < 2000
- period 2 referring to years >= 2000.
For each period
- first calculate the mean observed value for each country
- then rank countries by increasing MMR for each period.
Calculate the mean ranking across all periods, extract the 10 countries with the lowest ranking across all periods, and print the table.
```{r}
mmr %>%
mutate(before_2000 = year < 2000) %>%
group_by(before_2000, country) %>%
summarise(mean_obs = mean(mmr)) %>%
group_by(before_2000) %>%
mutate(rank = rank(mean_obs)) %>%
group_by(country) %>%
summarize(mean_rank = mean(rank))%>%
arrange(mean_rank) %>%
ungroup() %>%
slice(1:10)
```
# Part c :
Visualize the results for part a or b
- in a bar chart with countries on the y axis and mean rank on the x-axis
- with outcomes sorted by increasing rank for the selected countries。
```{r}
mmr %>%
mutate(before_2000 = year < 2000) %>%
group_by(before_2000, country) %>%
summarise(mean_obs = mean(mmr)) %>%
group_by(before_2000) %>%
mutate(rank = rank(mean_obs)) %>%
group_by(country) %>%
summarize(mean_rank = mean(rank))%>%
arrange(mean_rank) %>%
ungroup() %>%
slice(1:10) %>%
ggplot() +
geom_col(mapping = aes(x = fct_reorder(country, desc(mean_rank)),
y = mean_rank)) + theme_bw() +
scale_fill_brewer() + labs(x = "", y = "Mean rank") + coord_flip()
```